{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QbRrWtX0PXVr"
      },
      "source": [
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IST-DASLab/sparsegpt/blob/master/demo.ipynb)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mMUp4UrWjp-8"
      },
      "source": [
        "Install dependencies"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "VdbD9blm6j_r",
        "outputId": "47a5db11-b0a6-441e-e812-69a8e7676ded"
      },
      "outputs": [],
      "source": [
        "!pip install -q datasets\n",
        "!pip install -q transformers"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rhSblKg_jter"
      },
      "source": [
        "Clone repository"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3nCz469NhV3c",
        "outputId": "39acb0f6-445b-401e-e854-6f2bce746d55"
      },
      "outputs": [],
      "source": [
        "!git clone https://github.com/IST-DASLab/sparsegpt"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mbM_bJODjyBg"
      },
      "source": [
        "### Pruning example\n",
        "---"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Om0QSLnLj8JN"
      },
      "source": [
        "Below we will show an example of SparseGPT applied to OPT model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "d9NTGmD4iVK7",
        "outputId": "8b92ddee-42a3-4d12-b8cb-904353518493"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "/content/sparsegpt\n"
          ]
        }
      ],
      "source": [
        "%cd sparsegpt"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "HbiDyjx9j61I"
      },
      "source": [
        "Crerate directory to store prune model(s)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pJ-jauI-iyvi"
      },
      "outputs": [],
      "source": [
        "!mkdir -p sparse_opt"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OGLiExo5Ksc4"
      },
      "source": [
        "We will use `opt.py` script to prune the model.\n",
        "Select one of the following OPT versions to fit into colab (with `bitsandbytes` one should be able to use larger 6.7b and 13b models):\n",
        "* facebook/opt-125m\n",
        "* facebook/opt-350m\n",
        "* facebook/opt-1.3b\n",
        "\n",
        "To prune the model select dataset for calibration (`c4`, `ptb` or `wikitext`). The SparseGPT paper uses `c4` by default.\n",
        "\n",
        "One can prune model to uniform sparsity with SparseGPT either with unstructured pruning or semistructured `N:M` pattern.\n",
        "\n",
        "To apply unstructured pruning specify `--sparsity` - floating point number in `[0, 1]`.\n",
        "\n",
        "For semitstructured specify `--prunen` and `--prunem` arguments - integer numbers.\n",
        "\n",
        "To apply magnitude pruning instead of SparseGPT select `--gmp` option.\n",
        "\n",
        "To apply quantization on top of sparsity specify `--wbits`.\n",
        "\n",
        "In the example below we prune `acebook/opt-125m` to 0.5 unstructured sparsity via SparseGPT. Try different options.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "BxucjXmCibnI",
        "outputId": "7d9dd66e-5308-4ecd-f30e-e97861682f96"
      },
      "outputs": [],
      "source": [
        "!python opt.py facebook/opt-125m c4 --sparsity 0.5 --save sparse_opt/opt-125m"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5mrOL92aO5xy"
      },
      "source": [
        "Code above prints perplexity on `wikitext2`, `ptb` and `c4` benchmarks in the end."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AD9Zkgb-O21A"
      },
      "source": [
        "### Compare generations\n",
        "---"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "nSJIGizLkPm8"
      },
      "source": [
        "Let us compare generations produced by the dense and sparse model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "-GzBUGsXic0o"
      },
      "outputs": [],
      "source": [
        "from transformers import AutoTokenizer, OPTForCausalLM"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Ub-69himlTpZ"
      },
      "outputs": [],
      "source": [
        "device = 'cuda'"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "mQJtRPbekmXu"
      },
      "outputs": [],
      "source": [
        "# load dense model\n",
        "model_dn = OPTForCausalLM.from_pretrained('facebook/opt-125m', torch_dtype='auto').to(device)\n",
        "# load sparse model\n",
        "model_sp = OPTForCausalLM.from_pretrained('sparse_opt/opt-125m', torch_dtype='auto').to(device)\n",
        "# init tokenizer\n",
        "tokenizer = AutoTokenizer.from_pretrained('facebook/opt-125m')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Bqskug9-mXtR"
      },
      "outputs": [],
      "source": [
        "input_text = \"It takes a great deal of bravery\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "fS7YWAAhnatI"
      },
      "outputs": [],
      "source": [
        "input_ids = tokenizer(input_text, return_tensors=\"pt\").input_ids.to(device)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "w61F2J0QoPTi"
      },
      "source": [
        "Completion by dense model:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "o_xY5fSSnK2I",
        "outputId": "8594b1ba-2438-445e-8b4e-264d3dc943dd"
      },
      "outputs": [],
      "source": [
        "output_ids = model_dn.generate(input_ids)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "KRmGPG1tnoci",
        "outputId": "f7c1c71d-6141-47c2-90d8-4bb0d785c4a2"
      },
      "outputs": [],
      "source": [
        "print(tokenizer.decode(output_ids[0].cpu(), skip_special_tokens=True))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_9Zk6UaQpP9C"
      },
      "source": [
        "Completion by sparse model:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Ky5U9elZn-pL"
      },
      "outputs": [],
      "source": [
        "output_ids = model_sp.generate(input_ids)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "azFoDFrxpSJJ",
        "outputId": "47cc4a8a-cb17-47b5-b825-c93966074599"
      },
      "outputs": [],
      "source": [
        "print(tokenizer.decode(output_ids[0].cpu(), skip_special_tokens=True))"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
